Academic Open Internet Journal ISSN 1311-4360
|
Volume 23, 2008
|
Measuring Quality Attributes
of Web-based Applications
Part-II: Analysis and
Models
E-mail:
rsdhawan@rediffmail.com
E-mail:
rsagwal@rediffmail.com
Abstract:
The both
review and design of the effort assessment has been discussed in part-I of this
paper. This paper (part-II) explores the
variation of the effort estimation with the help of analysis and implementation
of Web-based applications using the effort assessment of part-I of this paper.
Here we
promote a simple, but efficient approach to estimate the effort needed for
designing Web-based applications with the help of RS Web Application Effort
Assessment (RSWAEA) model as discussed in part-I of this paper. This proposed
RSWAEA model has been designed after
carrying out an empirical study with the students of an advanced university
class and web designers that used various client-server based Web technologies.
Our first aim was to compare the relative importance of each Web-based design
model. Second, we also studied the quality of the designs obtained based on
construction
of a User Behavior Model Graph (UBMH) to capture the reliability of Web-based
applications. The results obtained from the above assessments can help us to
analytically identify the effort assessment and failure points in Web systems
and makes the evaluation of reliability of these systems
simple.
1.
Introduction
Reliable and precise effort assessment of high volume
Web software is critical for project selection, planning and control. Over the
past thirty years, various estimation models have been developed to help
managers perform estimation tasks, and this has led to a market offering of
estimation tools. For organizations interested in using such estimation tools,
it should be crucial to know about the predictive performance of the estimates
such tools produce. The construction of an
estimation model usually requires a set of completed projects from which an
arithmetic model is derived and which is used subsequently as the basis for the
estimation of future projects. So, there
is a need for an estimation model for the development effort of these Web
projects. In this paper, it is tried to point out the need for predictive
metrics to measure the development effort for Web-based applications.
Finally the
Web-based characteristics and parameters are used to
predict the effort and duration in terms of the Web systems development. A new set of database has been made on
the basis of a complete hypothetical study. The hypothetical study is conducted
by providing the dataset of Web documents. An empirical research and study was carried out to
provide effort assessment for small to large-size We-based applications. For
this paper, we have analyzed many findings drawn from the experienced
questionnaire. The results are augmented by answers of questionnaire from a
survey of the students of an advanced university class and web
designers. Our analyses suggest several areas (including reliability, usability,
complexity, cost, time requirements and type of nature of Web design) where both
Web-based designers, engineers and managers would benefit from better guidance
about the proper implementation of Web-based applications.
The techniques we propose
have the following key objectives:
1. Derive the UBMG in a
manner that capture complete details for valid sessions, and number of
occurrences of invalid sessions. The valid sessions have metrics such as session
count, reliability of session, probability of occurrence of the session, and
transition probability of the pages in the session.
2. Derive the
RSWAEA method to estimate the development effort of small to
large-size projects, especially in scenarios that require fast estimation with
little historical information. On the basis of RSWAEA method, the Web-based software effort estimations are
examined with user’s cost, cost drivers, data Web objects compatibility,
usability, maintainability, complexity, configuration, time requirements, and of
interfaces, would also be examined and considered.
3. Qualities of good software
metric
Lord Kelvin once said that when
you can measure what you are speaking about and express it in numbers, you know
something about it. Measurement is fundamental to any engineering discipline.
The terms "measure", "measurement", and "metrics" are often used
interchangeably, but according to Pressman [1] a measure provides a
quantitative indication of the extent, amount, dimensions, capacity, or size of
some attribute of a product or process. Measurement is the act of determining a
measure. The IEEE Standard Glossary of Software Engineering Terms [2] defines
metrics as "a quantitative measure of the degree to which a system, component,
or process possesses a given attribute". Ejiogu [3] suggested that a metric
should possess the following characteristics: (a) Simple and computable: It
should be easy to learn how to derive the metric and its computation should not
be effort and time consuming, (b) Empirically and intuitively persuasive: The
metric should satisfy the engineer's intuitive notion about the product under
consideration. The metric should behave in certain ways, rising falling
appropriately under various and conditions, (c) Consistent and Objective: The
metric should always yield results that are unambiguous. The third party would
be able to derive the same metric value using the same information, (d)
Consistent in its use of units and dimensions: It uses only those measures that
do not lead to bizarre combinations of units, (e) Programming language
independent, (f) An effective mechanism for quality feedback. In addition to the
above-mentioned characteristics, Roche [4] suggests that metric should be
defined in an unambiguous manner. According to Basil [5] Metrics should be
tailored to best accommodate specific products and processes.
4. Implementation and Analysis of
User Behavior Model Graph
(UBMG)
UBMG can be represented
in form of a graph or a matrix notation [6]. In the graph view, nodes represent
the pages, and arcs represent the transition from one node to another. In the
matrix representation each cell (i,j) corresponds to probability of transition
from page i to page j. We extend UBMG by adding an additional node to the
graphical view, and a column in case of the matrix view to represent errors
encountered while traversing. The construction of UBMG starts with the
navigational model and access logs as described in [7], where the navigational
model represents the complete overview of the different pages and the flow
between the pages in the Web system. The access logs store information regarding
the timestamp, page accessed client-id, referrer-id, HTTP return code etc. for
determining session information.
A sample format of IIS
log file is shown in figure1.
#Fields: date time c-ip s-port cs-uri-stem
cs-uri-query
sc-status time-taken cs (User-Agent) cs(Referrer)
Figure 1. format of IIS
server log file
<Date and Time>
<Client-id> <URL> <Referrer-id>
2007-01-19 00:00:00 203.124.225.19 a.asp -
2007-01-19 00:00:02 203.124.225.19 b.asp -
2007-01-19 00:00:03 203.124.225.19 c.asp d.asp
2007-01-19 00:00:05 203.124.225.19 e.asp f.asp
2007-01-19 00:00:06 203.124.225.19 c.asp b.asp
2007-01-19 00:00:07 203.124.225.19 f.asp a.asp
2007-01-19 00:00:10 203.124.225.19 d.asp e.asp
Figure 2. access log entries
of IIS server
We consider referrer-id
and the client-id fields as the basis to do a depth first search on the access
logs. This approach will segregate valid and invalid sessions. To understand,
consider an application with only two independent sessions- S1 with pages
(a® b® c) and S2 with pages
(d® e® f). Let the access log
have entries as shown in figure 2. Given that the depth first search is based
only with client-id field, we would have derived two valid sessions. However,
with the referrer-id field we determine the invalid path consisting of pages (a
® b® f). The count of all
such invalid sessions is determined, and the construction of UBMG is done only
for the valid sessions. Let us consider the example of an Online Shipping System
(OSS), where the two sessions defined in the navigational model are Session
1: “Export a package” with pages PackageSelection.asp® PackageDetails.asp
® Export.asp ® DeliveryLogistics.asp
® Payment.asp. Session
2: “Import a package” with pages PackageSelection.asp ® PackageDetails.asp
® Import.asp ® DeliveryLogistics.asp
® Payment.asp. We tag an
alias for the pages as given in figure 3.
a-PackageSelection.asp;
b-PackageDetails.asp; c- Export.asp; d- Import.asp;e-DeliveryLogistics.asp; f -
Payment.asp.
Figure 3 shows the
graphical view of UBMG with the exit node g. The matrix of transition
probabilities for the above graph is shown in Table 1. The matrix considers only
those sessions that have completed successfully. For example, sum of
probabilities of the paths out of the node b is 0.9 indicating that 10% of
clients had either dropped out or encountered errors.
Figure 3.
graphical view of UBMG
Figure 4.
addition of an error node to UBMG
The probability of
reaching a node j in the graph can be calculated using Markov property [7, 8,
9]. The generalized notation is, Nj =N1 * P(1,j) + N2 * P(2,j) + …..+ Nk *
P(k,j) Where, k is the number of nodes that lead to node j. In the OSS example,
to compute the probability of reaching the node b is 0.4 * Na + 0.2 * Nb + 0.2 *
Nc + 0.2 * Nd and probability of reaching the node e is 0.1 * Nc + 0.1 * Nd +
0.3 * Ne, where Na is always equal to one. The complexity of figures 3, 4, and 5
can be calculated using cyclomatic complexity (e-n+2 or e-n+1 or numbers of two
edges in a single node+1, where: e is the total
number of edges and n is the total number of
nodes).
|
a |
b |
c |
d |
e |
f |
g |
a |
|
0.4 |
|
|
|
|
|
b |
|
0.2 |
0.3 |
0.5 |
|
|
|
c |
|
0.2 |
|
|
0.1 |
|
|
d |
|
0.2 |
|
|
0.1 |
|
|
e |
|
|
|
|
0.3 |
0.6 |
|
f |
|
|
|
|
|
|
0.7 |
Table 1. matrix
of transition probabilities for OSS
4.1 Failure Analysis of
UBMG
Now, we extend the UBMG to include the failure data.
To capture the failure data, the access logs are scanned for HTTP return error
codes of 4xx and 5xx as mentioned in [10]. Besides this, the errors from other
servers are also considered. Theoretically, the error node can stem from any
page in the graphical view. We add the error node Er and all the page errors are associated with this
node. The matrix of transition probabilities will have an additional column to
represent the error node. A cell (m, Er) of this column will include the probability of
transitioning from the node m to error node Er.
Figure 5. addition of an error and the virtual
nodes to UBMG
Considering the OSS example, the view of UBMG with
the addition of error node is shown in figure 4. The matrix of transition
probabilities for the figure 4 is shown in Table 2. The matrix considers only
those sessions that have some error. Of
all the requests that enter node c, 40% of them encountered some error. Before
proceeding to failure analysis due to service-level agreements (SLA) violation,
we define the term Session Response Time (SRT) which is the sum of the service
times of all the pages in the session. We define the SLA at session level and
hence we need the desired response time target for each session. The access log files can be used to
determine the page service time (PST) values.
|
a |
b |
c |
d |
e |
f |
g |
Error |
a |
|
0.4 |
|
|
|
|
|
|
b |
|
0.2 |
0.3 |
0.5 |
|
|
|
|
c |
|
0.2 |
|
|
0.1 |
|
|
0.5 |
d |
|
0.2 |
|
|
0.1 |
|
|
|
e |
|
|
|
|
0.3 |
0.6 |
|
0.1 |
f |
|
|
|
|
|
|
0.7 |
|
Table 2. matrix of transition probabilities with error
node
For example, in the IIS Web server the time-taken
field represents the time spend by server to respond to the request. SRT is
computed as the sum of PST’ s of its individual pages. Further, we compute the
number of successful sessions where the SLA was violated. Let S1 and S2 be two
sessions for the OSS example. Table 3 shows the sessions information, where each
session is represented by a unique column, and includes number of successful
sessions, number of instances of SLA violation, etc. The probability of reaching
exit node for a session is computed as the ratio of number of exits with respect
to the number of visits at the entry page. Figure 5 shows the addition of
virtual nodes to the existing figure 4. The matrix of transition
probabilities for the figure 5 is shown in Table 4.
4.2 Calculation of
Reliability
To compute the
reliability of software code-level failures, we resort to determine the
probability of encountering the failure node PCODE-ERROR
represented in Figure 3.
To solve this probability of reaching the error node, we formulate a set of
equations from the matrix and use techniques like Cramer’ s Rule, Matrix
Inversion or Gauss Jordan Elimination method (for solving the sets of
simultaneous equations). We also compute (a) the total number of failures due to
invalid session NINVALID-SESSION, and (b) number of
instances where successful sessions did not meet SLA as NSLA-FAIL. The probability of
occurrence of invalid sessions is computed using (a). The probability of failure for a
session due to (b) is computed by considering the total number of its successful
sessions. In the OSS example, the probability of such failures is 0.59 in
Session 1 and 0.56 in Session 2. The probability of a session reaching the exit
node, but violating SLA or invalid sessions needs to be computed. The total
session failure probability PSESSION-FAILURE is calculated as the sum
of all the individual session probabilities and the probability of occurrence of
invalid sessions. The overall probability of failure PTOTAL-FAILURE for the system is
calculated as sum of the probability of reaching error node PCODE-ERROR, and the probability of
session failure PSESSION-FAILURE for the entire system.
The overall reliability RSYSTEM of the
system is calculated by the equation: RSYSTEM = 1 –
PTOTAL-FAILURE. Thus the reliability computation is driven by
failures at software code level, failures due to SLA violation and invalid
sessions.
SESSIONS |
S1 |
S2 |
1. Total no. of
successful session |
125 |
150 |
2. Total no. of SLA
violation NSLA-FAIL |
64 |
67 |
3. Probability of
failures due to (2) |
0.59 |
0.56 |
4. Prob. of reaching
exit node for each
session |
0.78 |
0.76 |
5. Probability of
SLA violation for each session using (3) and (4) |
0.37 |
0.32 |
Table 3. Results of SLA
violation probability
|
a |
b |
c |
d |
e |
f |
g |
Error |
a |
|
0.4 |
|
|
|
|
0.7 |
0.6 |
b |
|
0.2 |
0.3 |
0.5 |
|
|
|
|
c |
|
0.2 |
|
|
0.1 |
|
|
0.5 |
d |
|
0.2 |
|
|
0.1 |
|
|
|
e |
|
|
|
|
0.3 |
0.6 |
|
0.1 |
f |
|
|
|
|
|
|
0.7 |
|
Table 4. matrix of
transition probabilities with error and virtual nodes
In order to help the expert achieve more accurate
effort estimations, RSWAEA method introduces a new sizing metric based on the data
model of the Web-based information system to be developed: Data Web Objects
(DWO). DWO is an indirect sizing metric that takes into account the
characteristics of small to large-size projects. The idea behind the DWO is to
identify the system functionality, analyzing its data model. DWOs are similar to other
indirect metrics such as FPs [11], Object Points [12], or Web Objects [13] in
the fact that they represent abstract concepts that are used to obtain the size
of the system to be developed. Thus, we can fill the table 5 of DWOs to
calculate the system size. The weight assigned to each category of DWO
represents the development effort of each one, and it is based on the experience
of the expert estimator. As we have already discussed in part-I of this paper,
the effort estimation methods with the combination of WebMo and Web Objects
are not appropriate to estimate the development
effort of Web-based applications. Therefore, the RSWAEA method intends to
be more appropriate to estimate the development effort of small to medium-size
projects, especially in circumstances that require fast estimation with little
historical information. Let us continue with the
example of Visual Basic Script we used previously in UBMG for predicting a Web
application. The 96 DWO count in Table 5
represents the size of the program that would be required for this Web
application.
The last adjustable
coefficient in RSWAEA corresponds to constant P that is the exponent value of
the DWO*. This exponent is a value very close to 1.01, and it must neither be
higher than 1.12 nor lower than 0.99. The cost of each user is the values
between 0 and 5. A value of Cost of User
(CU) of 0
means the system reuses all the functionality associated with each user type is
zero. On the other hand, if the cost of user is five, this means that there is
no reuse of any kind of functionality for each user type. The value of X* (coefficient of DWO representativeness) is between 1 to
1.3 depending upon small to large-size Web-based
applications. The CU is a function of
the user types to be supported by the system. The RSWAEA method considers three user
types, defined as: Project manager, Web-designer and Counselor. The Project
manager is in charge of supervising the available applications in the system,
activating and deactivating functional areas of the system and maintaining the
set of applications that keep the project in constant execution. The
Web-designer uses the available functionality in the system to modify and
consult to the stored information. The Counselor has access to part of the
information available in the system, but only for reading. On the other hand,
RSWAEA
method also
considers the possibility that variable user, which is a mix of the
aforementioned as shown in table 6. Finally, the RSWAEA method uses a series of Cost
Drivers taken from the WebMo model proposed by Reifer [13]. These Cost Drivers
represent the available development scenarios for a particular project. Such
scenarios have positive and negative influences over the development process
that need to be taken into account during the estimation process. Cost Drivers
are subjective factors in RSWAEA
method, and
the values of these cost drivers are depicted in table 7. Generally, Web objects
consist of the Web documents (including empty and non-empty tags, image picture
files, sound files and scripting type).
Type
of DWO |
Amount
of Weight
Factor DWO |
Total
of DWO |
Regular
Entities |
5
´
8 |
40 |
Dependent
Entities |
1
´
10 |
10 |
Relationship
Entities |
3 ´
3 |
9 |
Relationship
1 to 1 |
1
´
3 |
3 |
Relationship
1 to N |
3
´
6 |
18 |
Number
of multimedia files |
2
´
6 |
12 |
Number
of scripts |
1
´
4 |
4 |
Total
of DWO
96
|
Table 5.
definition of DWOs amount
Fixed
Users |
User
Type |
Fraction
of the Scope (I) |
Reuse
Degree (R) |
Project
manager |
0.4 |
0.1 | |
Web-designer |
0.7 |
0.5 | |
Counselor |
0.8 |
0.9 | |
Variable
Users |
Secretary |
0.2 |
1.0 |
Area
Manager |
0.3 |
0.9 |
Table 6. example of a user’s table for different user
types in a Web-based application
For this model nine cost
drivers are defined: PRCLX: Product
reliability and complexity (Product attributes), PFDIF: Platform difficulty
(Platform and net servers volatility, PECAP: Personnel capabilities (Knowledge
skills and abilities of the work force), PEEXP: Experience of the personnel
(Depth and width of the work force experience), FACIL: Facility and
infrastructure (Tools, equipment and geographical distribution), SCHED:
Scheduling (Risk degree assumption if delivery time is shortened.) CLIEN: Client
type (Technology knowledge the client has; requirements stability), WTEAM: Work
team. (Ability to work synergistically as a team), and PROEFF: Process
efficiency (Development process efficiency). Each of
these cost drivers is classified in a five level scale: very low, low, normal,
high and very high (VL, L, N, H, VH). In order to determine which level
corresponds to each cost driver, the estimator uses a series of predefined
tables that were built using historical information. Each cost driver has an
assigned value in each category, and the product of each value is part of the
equation for calculating the effort in the RSWAEA method.
The assigned values in each category are replaced in the RSWAEA effort
estimation equation in order to obtain the result in
man-hours.
Cost
drivers for RSWAEA
method | |||||
Driver |
VL |
L |
N |
H |
VH |
PRCLX |
0.64 |
0.84 |
1.00 |
1.32 |
1.61 |
PFDIF |
0.85 |
0.95 |
1.05 |
1.28 |
1.70 |
PECAP |
1.52 |
1.28 |
1.02 |
0.92 |
0.85 |
PEEXP |
1.30 |
1.14 |
1.02 |
0.90 |
0.85 |
FACIL |
1.35 |
1.17 |
1.00 |
0.90 |
0.90 |
SCHED |
1.40 |
1.18 |
1.00 |
0.95 |
0.95 |
CLIEN |
1.45 |
1.25 |
1.04 |
0.88 |
0.80 |
WTEAM |
1.45 |
1.25 |
1.00 |
0.88 |
0.85 |
PROEFF |
1.30 |
1.15 |
1.05 |
0.90 |
0.70 |
Table 7.
cost drivers of RSWAEA method and their
values
6. Conclusions and Future
Work
In this paper we have
introduced an approach for determining the reliability and effort assessment/
estimation for Web-based systems. In order to get
fast and reliable effort estimations of Web-based information systems
development projects. These method, tested by offline and online analysis
of Web logs, come up with useful metrics like RSWAEA, UBMG, session count, SRT
computation etc., and these metrics can effectively be used for the computation
of reliability and efforts for small to larger-size Web-base applications.
Although these methods do not replace the expert
estimator, but they provide him/her with a tool for achieving a more accurate
estimation, based on real data in a shorter time. Estimating the cost,
duration and reliability of Web developments has a number of challenges related
to it. To handle these challenges, we have analyzed many findings drawn from the
experienced and expert opinions. Finally, by taking the good qualities of
a software metric and an accessible Web design, we validated that the
proposed models have better effort predictive accuracy up to 76.5% and the
overall reliability of the Web-based systems up to 72.5% than traditional
methods. Our future work may include the study of lexical analysis together with
COTS to develop the complete framework for effort assessment for authoring large
volume Web-based applications.
A major part of the research
reported in this paper was carried out at U.I.E.T, and D.C.S.A, K.U.K, Haryana, India. We are
highly indebted and
credited by gracious help from the Ernet section of K.U.K for their constant
support and help while testing our proposed models on to different computer
systems. The authors would like to thank those
nameless individuals who worked hard to supply the data.
[1].
Pressman S. Roger,
Software Engineering- A Practitioner's Approach (McGraw-Hill,
1997).
[2].
IEEE Trans. Software
Engineering, Vol. SE-10, pp-728-738, (1984).
[3].
Ejiogu, L., Software
Engineering with Formal Metrics (QED Publishing, 1991).
[4].
Roche, J.M., Software
Metrics & Measurement Principles, Software Engineering Notes, ACM, Vol. 19,
no. 1, pp.76-85, 1994.
[5].
Basili, V.R., &
D.M.Weiss, A Methodology For Collecting Valid Software Engineering Data, IEEE
Software Engineering Standards, Std. 610.12-1990, pp.47-48,
1993.
[6].
Daniel A. Menasce,
Virgilio A.F. Almeida, Scaling for E-Business Technologies, Models, Performance,
and Capacity Planning (Prentice Hall PTR, pp. 49-59, 2000).
[7].
Shubhashis Sengupta,
Characterizing Web Workloads- a Transaction-Oriented View, IEEE/ IFIP 5th
International Workshop on Distributed Computing (IWDC 2003),
2003.
[8].
Wen-Li Wang, Mei-Huei
Tang, User-Oriented Reliability Modeling for a Web System, 14th International
Symposium on Software Reliability Engineering (ISSRE), November 17 - 21,
2003.
[9].
D.A. Menace, V.A.F.
Almeida, R. Fonseca, M.A. Mendes, A Methodology for Workload Characterization of
E-Commerce sites, Proceedings of the 1st ACM conference on Electronic Commerce,
1999.
[10]. [The Internet Society,
Request for Comments (RFC): 2616. Hypertext Transfer Protocol–HTTP/1.1,
http://www.w3.org/Protocols/rfc2616/rfc2616.html.
[11]. International Function
Point Users Group, Function Point Counting Practices Manual,
http://www.ifpug.org/publications/manual.htm,
[12]. B. Boehm, Anchoring the
Software Process, IEEE Software, Vol. 13, No. 4, pages 73 -82, July
1996.
[13]. D.J. Reifer, Web
Development: Estimating Quick–to-Market Software, IEEE Software, Vol. 17, No. 6,
pages 57 - 64, November-December 2000.
About the
authors:
Sanjeev Dhawan is Lecturer in Computer Science & Engineering at
the Kurukshetra University, Kurukshetra, Haryana. He has done his postgraduates
degrees in Master of Science (M.Sc.)
in Electronics Master of Technology (M.Tech.) in Computer Science &
Engineering, and Master of Computer Applications (M.C.A) from the Kurukshetra University. At present he is pursuing PhD
in Computer Science from Kurukshetra University. His current research interests
include web engineering, advanced computer architectures, Intel microprocessors,
programming languages and bio-molecular level computing.
Rakesh Kumar received his PhD in Computer Science and M.C.A from
Kurukshetra University, Kurukshetra, Haryana. He is currently Senior Lecturer at
the Department of Computer Science
& Application, Kurukshetra University. His current research focuses on
programming languages, information retrieval systems, software engineering,
artificial intelligence, and compilers design.
Technical College - Bourgas,
All rights reserved, © March, 2000